Since the weight data for SP500 component stocks have not yet included in the analysis, I manually set evenly weight for all component stocks. The percentage of sectors tells the portion of the SP500 for each of sector.
The Asset Anslysis using Capital Asset Pricing Model is regressed by the daily return of each component stocks to the market daily return (baseline: SPY).
With no surprise, the results of CAPM shows The Beta for all SP500 component stock are positive, because the baseline SPY is based on this SP500 index. Besides, more than haft of the stocks has alpha less than zero, where only a few stocks have alpha greater than 0.001 (or 0.1%).
The Asset Anslysis using Five Fama French (FF5) factor s is regressed by the daily return of each component stocks to the Five Fama French factors, including market factor (market risk, MKT), size factor (small minus large, SMB), book-to-market factor (high minus low, HML), profitability factor (RMW, robust minus weak), and investment patterns factor (CMA, conservative minus aggressive). In this model, the Risk Exposure is captured by the Fama French factors, and described using the beta and alpha coefficients. Beside, Mentioned in literature, the FF5 model doesn’t not well explain the return of small size stock with high investment ratio and low profitability.
I first check whether the the SP500 component stocks is clusterable using the hopkins test. As known, a value of the hopkins statistic close to 1 tends to indicate the data is highly clustered, random data will tend to result in values around 0.5, and uniformly distributed data will tend to result in values close to 0. Extracting the six coeficient from the FF5 model of each of stock as the trainning data, the resulting hopkins statistic is 0.0881, which implies there may be more than 1 cluster existed. In the correlation plot of the daily return, we can see there are two obvious clusters. Moreover, there are at less four smaller clusters in the larger cluster of the two obvious clusters. By going through all the criterion, the consensus suggest 10 as the cluster number.
By putting the 505 stock as data points on the first two principle components, we can see the most of the 10 resulted cluster are barely separated from each other; whereas there are a few clusters have overlap at the center of this two components plot.
## $hopkins_stat
## [1] 0.08817222
##
## $plot
I use a bar chart to represent each the FF5 model coefficient by clusters. The result shows that four coefficients are crucial in clustering such as alpha, Beta_CMA, Beta_HML, Beta_RMV, whereas the Beta_MKT and Beta_SMB vary by magnitude between clusters. Moreover, the next violin plot also represent each the FF5 model coefficient by clusters, but in the sense of distribution.
The last plot is a scatter plot of executed return by the sum of FF5 coefficients. By masking and unmasking cluster groups, we can see the clusters locate around the origin.
As the last step, by considering each cluster as SubPortfolios. I use t-test for the expected_return, Alpha, SumBeta, stand_dev respectively, for each SubPortfolio. the result hightlightd at the following: